-
Notifications
You must be signed in to change notification settings - Fork 329
Add a column participant
to room_memberships
table
#18068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
participant
to room_membership
tableparticipant
to room_memberships
table
Requesting @anoadragon453's feedback on this as he has context. |
Delayed events complement test failure looks like a flake? |
@anoadragon453 just wondering if I could get your input on this to ensure it aligns with what we discussed in #18040? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@H-Shay apologies for the slow turn-around on this one. I've been swamped with other work :(
I've made some large suggestions below, I'm afraid, but overall this is heading in a good direction!
txn: LoggingTransaction, last_room_id: str | ||
) -> Optional[str]: | ||
sql = """ | ||
SELECT room_id from room_memberships WHERE room_id > ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that room_memberships
is a table that is always be appended to, and thus always changing under you. It is ordered by its event_stream_ordering
column. So, the only way to traverse it while the system is running, without leaving gaps, is to iterate using the event_stream_ordering
column.
Note: room_memberships
only has rows deleted from it when a room is purged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we have:
- a query to pull out a room ID
- a query that pulls out all users that have ever been joined to that room
- a query per-user per-room that updates all previous entries that match that user/room combination
I think we can instead do this in one query that processes a batch of room_membership
rows all at once. Instead of saving the current room_id
for the batch job, start with the currently max event_stream_ordering
row and work backwards in batches of say 1000.
Constrain your query to the current event_stream_ordering
- BATCH_SIZE
. Then within that, UPDATE all rows based on data in the events
table. Then save the new event_stream_ordering - BATCH_SIZE
to your background job.
Now the table can continue to grow without things changing from underneath you, as historical data is only (rarely) deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look - I have rewritten the query as you requested. I did test it on a demo server with some trivial data and it worked, but do make sure I've got it correct as I don't often write SQL queries that aren't mostly simple.
synapse/storage/schema/main/delta/89/01_add_column_participant_room_memberships_table.sql
Outdated
Show resolved
Hide resolved
complement failures look like a flake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly there now, just a couple of minor things. Thanks for being patient!
@anoadragon453 just checking this is still on your radar? No rush, just want to make sure it didn't get lost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge apologies @H-Shay, this one indeed fell through the cracks.
Some comments below on the historical background job, but otherwise LGTM.
@anoadragon453 thanks for the review, I just had a question about this statement:
I think what that is saying is that if the user has sent a message at any point in the room's history they will be set as a participant, unless they are currently not in the room - is that correct thus far? Because if they aren't currently in the room, they will be marked as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anoadragon453 thanks for the review, I just had a question about this statement:
This background update will always mark those rows as participant=false if the user isn't currently joined to the room. And it will mark participant=true if they sent a message at any point in the room's history, not just when they were joined that one time.
I think what that is saying is that if the user has sent a message at any point in the room's history they will be set as a participant, unless they are currently not in the room - is that correct thus far?
That's correct!
Because if they aren't currently in the room, they will be marked as
participant=false
, and if they are currently in the room and they ever sent a message they will be marked asparticipant=true
, regardless of whether they have joined/left in the interm? This is how I would expect it to work but I think there is an edge case you are pointing out and I am missing - would you be willing to elaborate?
This harkens back to #18068 (comment), where you could get information about whether they were a participant at the time. However, you had indicated that wasn't too useful for T&S purposes, and I think I had forgotten that.
At this point, I would just mention that the data shouldn't be relied on to produce historical participation data.
@anoadragon453 thanks for the answer and thanks for bearing with this - I know you are busy :) I've made the changes so hopefully we are close to done with this. |
To allow to easily adding more test cases.
Allow user2 to send state events into the room.
Even if the type of the state event tries to replicate that of a non-state message or encrypted event.
Users can send stickers and not be considered a participant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM now (assuming tests pass)! Thank you for your hard work and patience @H-Shay.
I added some commits that parameterised the test cases (making it trivial to add more), as well as added a conditional that the event sent by the user is not a state event.
I also added a test case showing that users can send other event types (such as stickers) into the room and not be marked as a participant.
Before I merge, can you confirm that you're still happy to constrain participation to only m.room.message
and m.room.encrypted
? I'm wondering if event types like m.reaction
, m.sticker
, m.call.invite
and m.poll.start
should be added as well.
@anoadragon453 that is correct, thank you so much! |
Wonderful, thanks again for your work here! |
Follow on from #18068 Currently the subquery in `UPDATE` is pointless, as it will still just update all `room_membership` rows. Instead, we should look at the current membership event ID (which is easily retrieved from `local_current_membership`). We also add a `AND NOT participant` to noop the `UPDATE` when the `participant` flag is already set. cc @H-Shay
AND room_memberships.event_stream_ordering <= ? | ||
AND room_memberships.event_stream_ordering > ?; | ||
""" | ||
batch = int(stream_token) - _POPULATE_PARTICIPANT_BG_UPDATE_BATCH_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional that we're not using the batch_size
parameter?
This background update currently does not autoscale down. I wonder if this might be part of the cause of pain for many SQLite users: #18325
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not. I missed that in the review. I'll write a quick PR to use that, good spot!
Adds a column
participant
toroom_memberships
table to track whether a user has participated in a room - participation is defined as having sent am.room.message
orm.room.encrypted
event into the room.Related to #18040, the approach there to determine room participation was deemed too inefficient and adding this column was the recommend remedy.